Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
نویسندگان
چکیده
This paper describes a method to automatically create dialogue resources annotated with dialogue act information by reusing existing dialogue corpora. Numerous dialogue corpora are available for research purposes and many of them are annotated with dialogue act information that captures the intentions encoded in user utterances. Annotated dialogue resources, however, differ in various respects: data collection settings and modalities used, dialogue task domains and scenarios (if any) underlying the collection, number and roles of dialogue participants involved and dialogue act annotation schemes applied. The presented study encompasses three phases of data-driven investigation. We, first, assess the importance of various types of features and their combinations for effective cross-domain dialogue act classification. Second, we establish the best predictive model comparing various cross-corpora training settings. Finally, we specify models adaptation procedures and explore late fusion approaches to optimize the overall classification decision taking process. The proposed methodology accounts for empirically motivated and technically sound classification procedures that may reduce annotation and training costs significantly.
منابع مشابه
Investigating the Portability of Corpus-Derived Cue Phrases for Dialogue Act Classification
We present recent work in the area of Cross-Domain Dialogue Act tagging. Our experiments investigate the use of a simple dialogue act classifier based on purely intra-utterance features principally involving word n-gram cue phrases. We apply automatically extracted cues from one corpus to a new annotated data set, to determine the portability and generality of the cues we learn. We show that ou...
متن کاملCross-Domain Dialogue Act Tagging
We present recent work in the area of Cross-Domain Dialogue Act (DA) tagging. We have previously reported on the use of a simple dialogue act classifier based on purely intra-utterance features — principally involving word n-gram cue phrases automatically generated from a training corpus. Such a classifier performs surprisingly well, rivalling scores obtained using far more sophisticated langua...
متن کاملTransfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
Spoken conversation corpora often adapt existing Dialogue Act (DA) annotation specifications, such as DAMSL, DIT++, etc., to task specific needs, yielding incompatible annotations; thus, limiting corpora re-usability. Recently accepted ISO standard for DA annotation – Dialogue Act Markup Language (DiAML) – is designed as domain and application independent. Moreover, the clear separation of dial...
متن کاملDialogue Act Recognition using Reweighted Speaker Adaptation
In this work we study the effectiveness of speaker adaptation for dialogue act recognition in multiparty meetings. First, we analyze idiosyncracy in dialogue verbal acts by qualitatively studying the differences and conflicts among speakers and by quantitively comparing speaker-specific models. Based on these observations, we propose a new approach for dialogue act recognition based on reweight...
متن کاملHierarchical Dialogue Act Classification in Online Tutoring Sessions
As the corpora of online tutoring sessions grow by orders of magnitude, dialogue act classification can be used to capture increasingly fine-grained details about events during tutoring. In this paper, we apply machine learning to build models that can classify 133 (126 defined acts plus 7 to represent unknown and undefined acts) possible dialogue acts in tutorial dialog from online tutoring se...
متن کامل